1 Introduction

1.1 Input files:

  • (CTTI AACT) aact_studies.tsv
  • (CTTI AACT) aact_drugs.tsv
  • (CTTI AACT) aact_descriptions.tsv
  • (NextMove LeadMine) aact_drugs_leadmine.tsv
  • (PubChem) aact_drugs_smi_pubchem_cid.tsv
  • (PubChem) aact_drugs_smi_pubchem_cid2inchi.tsv
  • (ChEMBL) aact_drugs_inchi2chembl.tsv
  • (ChEMBL) aact_drugs_chembl_activity_pchembl.tsv
  • (ChEMBL) aact_drugs_chembl_target_component.tsv
  • (IDG TCRD/Pharos) pharos_targets.tsv
  • (JensenLab Tagger) aact_descriptions_tagger_matches.tsv
  • (JensenLab Dictionary) diseases_entities.tsv

nct_id is the study ID.

## [1] "Thu Apr  4 16:02:14 2019"
library(readr)
library(data.table)
library(plotly, quietly=T)

2 Input studies and drugs

2.1 Studies

Read file of all studies in AACT.

## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"

2.1.1 Study references

Reference type results_reference may offer greater evidence, confidence.

## [1] "references: 388031; NCT_IDs: 61208; PMIDs: 287758; results_references: 64880"

2.2 Drugs

Read file of all drugs in AACT.

  • id is AACT ID.
  • Note that one study may involve multiple drugs.
  • At this point a “drug” is identified by a name.
## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"

2.3 Studies: Interventional drug studies only

Select only Interventional studies (study_type) associated with drugs (via nct_id).

## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
Drug studies and drugs, by phase
phase N_studies N_drugs
Early Phase 1 1574 2615
Phase 1 23603 48593
Phase 1/Phase 2 6663 13288
Phase 2 33910 68850
Phase 2/Phase 3 3305 6503
Phase 3 22988 49507
Phase 4 19593 36331
NA 12785 29390
Drugs (itv_ids), by study overall_status
overall_status N
Completed 145006
Recruiting 33973
Terminated 19618
Unknown status 18463
Active, not recruiting 13962
Not yet recruiting 8001
NA 7080
Withdrawn 6969
Enrolling by invitation 1060
Suspended 945

2.4 Drugs by study start_year

(To do: stack with study start_year.)

## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

2.5 Drug-trials by Phase and Status

3 NextMove Leadmine Chemical NER

AACT drug names resolved to standard names and structures via SMILES. Note that one name may include multiple chemicals. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).

## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"

3.1 Chemical NER mentions

3.1.1 Disambiguated totals by resolved structure (SMILES)

Top 20 drugs by total mentions
smi2img N_mentions names
2637 Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol
2545 CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide
2461 CISPLATIN; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum
2070 DEXAMETHASONE; Dexamethason; Dexamethasone; Dexamethosone; Maxitrol; OZURDEX; Oradexon; Ozurdex; dexamethason; dexamethasone; dexamethosone
2054 CARBOPLATIN; Carboplatin; Carboplatine; Paraplatin; carboplatin; carboplatine
1779 DOCETAXEL; Docetaxel; docetaxel
1625 METFORMIN; MetFORMIN; Metformin; Metformine; metformin; metformine
1540 GEMCITABINE; Gemcitabine; gemcitabine
1342 CAPECITABINE; Capecitabin; Capecitabine; XELODA; Xeloda; capecitabine; xeloda
1178 Cortancyl; Lodotra; Meticorten; Prednison; Prednisone; RAYOS; prednison; prednisone
1157 0xaliplatin; Eloxatin; OXALIPLATIN; OXAliplatin; Oxaliplatin; Oxaliplatine; eloxatin; oxaliplatin; oxaliplatine
1157 METHOTREXATE; Methotrexate; Metoject; methotrexate
1086 BUPIVACAINE; Bupivacain; Bupivacaine; EXPAREL; Exparel; SKY0402; bupivacain; bupivacaine
1044 ETOPOSIDE; Etoposid; Etoposide; etoposide
1027 ADOPORT; ADVAGRAF; Adoport; Advagraf; ENVARSUS; Envarsus; FK-506; FK506; PROGRAF; Prograf; Protopic; TACROLIMUS; Tacrolimus; tacrolimus
978 NORMAL SALINE; Normal Saline; Normal saline; normal salin; normal saline
977 LIDOCAINE; LMX 4; LMX4; Lidocain; Lidocaine; Lidoderm; Lignocain; Lignocaine; Oraqix; lidocain; lidocaine; lignocaine
908 CYTARABINE; Cytarabine; Cytosar; DepoCyt; DepoCyte; Depocyt; Depocyte; cytarabine; cytosar
903 COPEGUS; Copegus; REBETOL; RIBAVIRIN; Rebetol; Ribasphere; Ribavarin; Ribavirin; Ribavirine; Virazole; rebetol; ribavarin; ribavirin
846 Diprivan; PROPOFOL; Propofol; propofol

3.1.2 Chemical NER mentions resolved to structures (SMILES)

## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"

3.1.3 Chemical NER mentions by intervention ID.

## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"

3.1.4 Chemical NER mentions by trial (NCT ID).

## [1] "Mentions by study: 92966 / 99647 (93.3%)"

3.1.5 Chemical NER mentions by drug, i.e. name in AACT.

## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"

4 PUBCHEM:

4.1 Intervention IDs to CIDs from PubChem (via SMILES)

## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"

4.2 InChIKeys from PubChem (via CIDs)

## [1] "PubChem CIDs with InChIKeys: 3801"

5 CHEMBL:

5.1 ChEMBL molecule IDs, and properties (via InChIKeys)

## [1] "ChEMBL compounds mapped via InChIKeys: 3332"

5.2 ChEMBL activities for mapped compounds

Select only activities with pChembl values for confidence.

## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"

5.3 ChEMBL targets (via activities)

## [1] "ChEMBL target proteins: 3157"

6 IDG/TCRD:

## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"

6.1 Targets by organism (top 10):

## [1] "Organisms: 187"
Targets by organism (top 10)
organism N_targets
Homo sapiens 1806
Rattus norvegicus 529
Mus musculus 238
Bos taurus 98
Sus scrofa 36
Cavia porcellus 26
Escherichia coli K-12 19
Oryctolagus cuniculus 18
Escherichia coli 17
Mycobacterium tuberculosis 17

6.2 Human single-protein targets only.

## [1] "Human targets: 1806"
target_type N
SINGLE PROTEIN 1216
PROTEIN COMPLEX 247
PROTEIN FAMILY 210
PROTEIN COMPLEX GROUP 91
PROTEIN-PROTEIN INTERACTION 16
SELECTIVITY GROUP 14
CHIMERIC PROTEIN 12
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"

6.3 Targets by IDG Target Development Level (TDL):

## [1] "   Tchem:    733" "   Tclin:    341" "    Tbio:    140"
## [4] "   Tdark:      2"

7 JensenLab Tagger Diseases NER

With JensenLab DOID entities dictionary. On descriptions from detailed_descriptions table.

7.1 Disambiguated total disease mentions by DOID.

Top 20 diseases by total mentions
doid N_mentions terms
DOID:4 76402 DISEASE; Disease; dis- ease; dis-ease; disease
DOID:0111161 73734 CAN; CaN; Can; can
DOID:162 28596 CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; malignant Tumor; malignant neoplasm; malignant tumor; primary cancer
DOID:9351 17274 DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus; diabetes mellitus; diabetes-mellitus
DOID:6713 16632 CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE; Stroke; cerebro- vascular disease; cerebro-vascular disease; cerebrovascular accident; cerebrovascular disease; cerebrovascular disorder; cerebrovascular syndrome; cv-a; cva; stroKe; stroke
DOID:2030 12084 ANXIETY; Anxiety; Anxiety Disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; anxiety syndrome; anxiety-state
DOID:1612 10583 BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer; breast Cancer; breast caNcEr; breast cancer; breast tumor; breast-cancer; breastcancer; mammary cancer; mammary tumor; primary breast cancer
DOID:2841 10021 ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper reactivity; bronchial hyper-reactivity; bronchial hyperreactivity; exercise induced asthma; exercise-induced asthma
DOID:3083 9782 CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive Lung disease; Chronic Obstructive Pulmonary Disease; Chronic Obstructive Pulmonary disease; Chronic Obstructive lung Disease; Chronic Obstructive pulmonary Disease; Chronic Obstructive pulmonary disease; Chronic obstructive airway disease; Chronic obstructive lung disease; Chronic obstructive pulmonary disease; Cold; chronic Obstructive Lung Disease; chronic obstructive airway disease; chronic obstructive lung disease; chronic obstructive pulmonary disease; chronic obstructive pulmonary disorder; cold; copd
DOID:9970 9303 OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity
DOID:10763 9144 HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high blood Pressure; high blood pressure; high blood-pressure; htn; hyper-tension; hypertension; hypertensive disease; hypertensive disorder
DOID:3393 6816 C-HD; CAD; CHD; CORONARY ARTERY DISEASE; CORONARY SYNDROME; CORONARY syndrome; ChD; Coronary ARtery DIsease; Coronary Artery Disease; Coronary Disease; Coronary Heart Disease; Coronary Heart disease; Coronary Syndrome; Coronary artery disease; Coronary disease; Coronary heart disease; Coronary-artery-disease; coronary Syndrome; coronary arteriosclerosis; coronary artery dis-ease; coronary artery disease; coronary disease; coronary heart disease; coronary syndrome; coronary-artery disease; coronary-artery-disease
DOID:0060145 6115 ANALGESIA; Analgesia; analgeSia; analgesia
DOID:0111084 5958 FACE; FaCE; Face; face
DOID:9352 5848 Diabetes Mellitus Type 2; Diabetes Mellitus Type II; Diabetes Mellitus type 2; Diabetes Mellitus, Type II; Diabetes mellitus Type 2; Diabetes mellitus non-insulin-dependent; Diabetes mellitus type 2; Diabetes mellitus type II; NIDDM; Non-Insulin Dependent Diabetes Mellitus; Non-Insulin-Dependent-Diabetes Mellitus; Non-insulin dependent diabetes mellitus; Non-insulin-dependent Diabetes Mellitus; Type 2 - Diabetes Mellitus; Type 2 Diabetes; Type 2 Diabetes Mellitus; Type 2 Diabetes mellitus; Type 2 diabetes; Type 2 diabetes mellitus; Type 2-diabetes mellitus; Type II Diabetes; Type II Diabetes Mellitus; Type II Diabetes mellitus; Type II diabetes; Type II diabetes mellitus; Type-2 Diabetes; Type-2 Diabetes Mellitus; Type-2 diabetes; Type-2 diabetes mellitus; Type-2-diabetes; Type-II diabetes; Type2 Diabetes Mellitus; Type2 diabetes; Type2 diabetes mellitus; diabetes mellitus type 2; diabetes mellitus type II; diabetes mellitus type-2; diabetes mellitus type2; diabetes mellitus, type 2; maturity onset diabetes; maturity-onset diabetes; non insulin dependent diabetes mellitus; non insulin-dependent diabetes mellitus; non-insulin dependent diabetes mellitus; non-insulin-dependent diabetes mellitus; noninsulin-dependent diabetes mellitus; type -2 diabetes mellitus; type 2 Diabetes; type 2 Diabetes Mellitus; type 2 diabetes; type 2 diabetes mellitus; type 2-diabetes; type 2diabetes; type 2diabetes mellitus; type II Diabetes; type II Diabetes Mellitus; type II diabetes; type II diabetes mellitus; type II-diabetes; type-2 Diabetes; type-2 diabetes; type-2 diabetes mellitus; type-2-diabetes; type-II diabetes; type-II diabetes mellitus; type-II- diabetes mellitus; type2 diabetes; type2 diabetes mellitus
DOID:10283 5056 Familial Prostate Cancer; HPC; PRostate Cancer; Prostate CAncer; Prostate Cancer; Prostate cancer; Prostatic cancer; hereditary prostate cancer; prostate Cancer; prostate cancer; prostate-cancer; prostatic cancer
DOID:8469 4985 FLU; Flu; Influenza; flu; influenza
DOID:225 4962 SYNDROME; Syndrome; syn drome; syndrome
DOID:3908 4959 NSCLC; Non Small Cell Lung Cancer; Non Small Cell Lung Carcinoma; Non Small Cell Lung cancer; Non small cell lung cancer; Non small-cell lung cancer; Non- small cell lung cancer; Non-Small Cell Lung Cancer; Non-Small Cell Lung Carcinoma; Non-Small Cell Lung cancer; Non-Small cell lung cancer; Non-Small- Cell Lung Cancer; Non-Small-Cell Lung Cancer; Non-Small-Cell lung Cancer; Non-small Cell Lung Cancer; Non-small Cell Lung Carcinoma; Non-small cell Lung Cancer; Non-small cell lung cancer; Non-small cell lung carcinoma; Non-small-cell Lung Cancer; Non-small-cell lung cancer; nSCLC; non small cell lung cancer; non small cell lung carcinoma; non small-cell lung cancer; non- small cell lung cancer; non-small Cell Lung Cancer; non-small cell Lung cancer; non-small cell lung Cancer; non-small cell lung cancer; non-small cell lung carcinoma; non-small-cell lung cancer; non-small-cell lung carcinoma; non-small-cell lung-cancer; nonsmall cell lung cancer; nonsmall cell lung cancer; nonsmall- cell lung cancer
DOID:784 4841 CKD; CKF; CRD; CRF; Chronic Kidney Disease; Chronic Kidney disease; Chronic Kidney failure; Chronic Renal Disease; Chronic kidney disease; Chronic kidney failure; Chronic renal disease; chronic Kidney disease; chronic kidney disease; chronic kidney failure; chronic renal disease; chronic renal failure syndrome; ckd; crf; renal failure chronic

7.2 Disease mentions by study.

Sort synonyms terms by frequency.

Disease mentions by study
nct_id doid N_mentions disease_terms
NCT00000102 DOID:0111161 1 can
NCT00000102 DOID:0050811 1 congenital adrenal hyperplasia
NCT00000105 DOID:11338 6 tetanus;Tetanus
NCT00000113 DOID:11830 14 myopia;Myopia;nearsightedness
NCT00000113 DOID:9835 1 refractive error
NCT00000113 DOID:1432 1 blindness
NCT00000114 DOID:10584 1 Retinitis pigmentosa
NCT00000114 DOID:8499 1 night blindness
NCT00000114 DOID:8466 1 retinal degeneration
NCT00000114 DOID:4 1 disease
NCT00000115 DOID:4447 5 cystoid macular edema
NCT00000115 DOID:13141 5 uveitis;Uveitis
NCT00000115 DOID:1686 2 glaucoma
NCT00000115 DOID:4 2 disease
NCT00000115 DOID:0111161 1 can
NCT00000115 DOID:8947 1 Diabetic Retinopathy
NCT00000115 DOID:1432 1 visual impairment
NCT00000115 DOID:83 1 cataract
NCT00000116 DOID:4 2 disease
NCT00000116 DOID:0111161 1 can

7.3 Aggregate by study (NCT_ID)